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Abstract 

The local histogram transform of an image is a data cube that consists of the histograms of the pixel values that lie 
within a fixed neighborhood of any given pixel location. Such transforms are useful in image processing applications 
such as classification and segmentation, especially when dealing with textures that can be distinguished by the dis- 
tributions of their pixel intensities and colors. We, in particular, use them to identify and delineate biological tissues 
found in histology images obtained via digital microscopy. In this paper, we introduce a mathematical formalism that 
rigorously justifies the use of local histograms for such purposes. We begin by discussing how local histograms can 
be computed as systems of convolutions. We then introduce probabilistic image models that can emulate textures one 
routinely encounters in histology images. These models are rooted in the concept of image occlusion. A simple model 
may, for example, generate textures by randomly speckling opaque blobs of one color on top of blobs of another. Un- 
der certain conditions, we show that, on average, the local histograms of such model-generated-textures are convex 
combinations of more basic distributions. We further provide several methods for creating models that meet these 
conditions; the textures generated by some of these models resemble those found in histology images. Taken together, 
these results suggest that histology textures can be analyzed by decomposing their local histograms into more basic 
components. We conclude with a proof-of-concept segmentation-and-classification algorithm based on these ideas, 
supported by numerical experimentation. 
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1. Introduction 

A local histogram of an image is a histogram of the values of the pixels that lie in a neighborhood of a given pixel's 
location. It indicates the particular combination of pixel intensities or colors that appear in that neighborhood. When 
used as features in an image classification scheme, such histograms can help distinguish one texture from another. 
We, in particular, use them in automated segmentation-and-classification algorithms for digital microscope images of 
biological tissues. 

To be precise, the work presented here was motivated by the need to identify and delineate the various tissues 
exhibited in images of histological sections of teratoma tumors derived from embryonic stem cells, such as the one 
given in Figure 1(a). This image was provided by Dr. Carlos Castro of the University of Pittsburgh and Dr. John 
A. Ozolek of the Children's Hospital of Pittsburgh, who grow and image such teratomas to gain greater insight into 
tissue development. In this image, which is purple-pink from hematoxylin and eosin (H&E) staining, even a layman 
can discern several distinct textures, each corresponding to a distinct tissue type. For each image under study, Drs. 
Castro and Ozolek make use of their years of medical training and experience to identify what tissues are present, 
and to what degree. Moreover, when provided with a point-and-click interface, they can manually segment the image 
according to tissue type, resulting in per-pixel labels such as those given in Figure 1(b). Though straightforward for 
medical experts, such tasks are nevertheless tedious and time-consuming, leading to inconsistencies when working 
with large data sets. It is therefore our goal to automate as much of this process as is possible. Our current algorithm 
is given in [2] and builds upon previous work given in [1, 4, 14]. 
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Figure 1 : (a) A digital microscope image of a H&E-stained tissue section, (b) The histology image has been manually segmented and classified by 
a medical expert, resulting in the per-pixel labels. From darkest to lightest, the labels indicate cartilage, pseudovascular tissue, connective tissue, 
bone, fatty tissue, and background pixels, respectively. Our goal is to automate this segmentation-and-classification process. The purpose of this 
paper is to provide a theoretical justification for using local histograms to achieve this goal. 



Our use of local histograms was motivated by the unique image features found in histology images of teratomas 
derived from primate embryonic stem cells. In layman's terms, these tumors begin as masses of undifferentiated cells 
that are implanted in laboratory animals. Over time, these tumors grow and their cells differentiate into many various 
types — bone, cartilage, skin, etc. — until a point at which they are excised, sectioned, stained and viewed under a 
microscope, resulting in images such as the one in Figure 1(a). As such, these images exhibit a wide variety of tissue 
types, arranged in a seemingly random fashion. Indeed, to a casual observer such images can appear as a jumbled 
mess. In truth however, the arrangement of these tissues is not completely random, and is rather the result of not yet 
well-understood biological mechanisms. Drs. Castro and Ozolek believe that by looking at many such images — many 
sections of many teratomas — they can gain greater insight into these mechanisms. Here, spatial context is crucial: one 
must identify which particular tissue is present at any given point in order to estimate the total amount of each type, 
as well as the degree to which any given type is adjacent to other types. 

In light of these facts, we seek an algorithm which assigns a tissue label to each pixel, thereby segmenting (de- 
lineating) and classifying (identifying) the image at the same time. Indeed, such an algorithm would be useful in a 
broad class of digital pathology applications beyond the teratoma problem [2] . While designing such an algorithm, we 
must keep in mind that often no single pixel contains enough information to uniquely determine a label. Rather, the 
decisions will be made based on features computed over some fixed neighborhood of every given pixel location. To 
determine which specific features to use, it helps to have a closer look at each individual tissue. For example, for the 
1200 X 1200 image given in Figure 1(a) and thumbnailed in Figure 2(a), we zoom in on three tissue types — cartilage, 
connective tissue and pseudovascular tissue — resulting in the 128 x 128 subimages given in Figure 2(b), (c) and (d), 
respectively. Each of these three tissue types exhibits a unique aperiodic texture. For instance, the cartilage texture 
can be regarded as a light purple field speckled with darker reddish-purple blobs; each blob represents an individual 
cell's nucleus. Meanwhile, connective tissue appears as dark purple blobs over a light pink field; pseudovascular 
tissue is similar to connective tissue, but contains additional reddish-pink structures. In particular, these three textures 
exhibit distinct distributions of color, a fact which can quantitatively be confirmed by computing the two-dimensional 
histograms of their red-blue (RB) pixel value pairs, as depicted in Figure 2(f), (g) and (h). 

As certain tissues can be distinguished from others based solely on the distributions of their pixel values, we 
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(e) The RB histogram of (a). (f) The RB histogram of (b). (g) The RB histogram of (c). (h) The RB histogram of (d). 

Figure 2: A 1200 x 1200 histology image exhibiting multiple tissue types (a) and the 256x256 histogram of its red-blue (RB) pixel values (e). As is 
common with H&E staining, the tissues in (a) are purple-pink and so we ignore the green component of these red-green-blue (RGB) images when 
computing (e). This histogram is viewed from above, with red and blue ranging from to 255 on the horizontal and vertical axes, respectively; 
here the height of the histogram is proportional to darkness for the sake of readability. In (b), (c) and (d) we zoom in on three 128 x 128 patches 
extracted from (a), each of which exhibit a single tissue type, namely cartilage, connective tissue and pseudovascular tissue, respectively. Each 
of these three tissue types has a distinct distribution of pixel values, as evidenced by their corresponding RB histograms (f), (g) and (h). These 
256 X 256 histograms are similar to (e), but are only computed over those points of a given type according to the ground truth labels in Figure 1(b). 
In particular, the histogram (f) of cartilage (b) is computed over all points labeled in black in Figure 1(b). We see that cartilage is darker, on 
average, than connective: (f) is distributed more towards the lower left-hand side than (g) is. Moreover, pseudovascular is similar to connective, 
but possesses additional reddish-pink structures, as evidenced by the subdiagonal blob found in (h), but not (g). As such, it is plausible that local 
histograms can serve as discriminating features in segmentation-and-classification algorithms. 

propose to use histograms as image features in a segmentation-and-classification scheme. These histograms must 
be computed locally — over a fixed neighborhood of every pixel location — since global histograms, such as the one 
depicted in Figure 2(e) derived from Figure 2(a), destroy spatial context by mixing all of the individual distributions 
together. A similar issue arises in time-frequency analysis: spectrograms preserve spatial context while Fourier trans- 
forms do not. Indeed, local histograms are philosophically similar to spectrograms: in a neighborhood of a given 
point, the local histogram transform estimates the frequency of occurrence of a given value while the spectrogram 
estimates frequency in the traditional sense. 

The purpose of this paper is to provide a mathematically rigorous justification for the use of local histograms in this 
fashion. To be precise, we regard our images as functions from a finite abelian group X of pixel locations into a second 
finite abelian group J/ of pixel values. That is, our images / are members of the set ^(^, J/) := {f : X ^ J/}. For 
example, the 1200 x 1200, 8-bit red-green-blue (RGB) image given in Figure 1(a) has X = ^^200 ~ ^1200 X ^1200 and 
J/ = ^256' where denotes the cyclic group of integers modulo N. For purple-pink H&E- stained images, we often 
omit the green channel for the sake of computational efficiency, at which point J/ becomes Z^^^. The local histograms 
of an image / are defined in terms of a weighting function, that is, a nonnegatively-valued w e £(X, M) whose values 
sum to one. Specifically, the local histogram transform of / with respect to w is the function hll^f : x J/ ^ R, 

(LH,/)(x, j) := + ^0), (1) 
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where Sy(f(x + x')) = I if f(x -\- x') = y and is otherwise zero. For any fixed x e X, the corresponding cross-section 
of this function, namely (LHw/)(x, -) \ ^ ^ R, counts the number of instances at which / obtains a given value y in 
a w-neighborhood of x. 

In this paper, we show that local histogram transforms (1) are well-suited to the analysis of a particular class of 
textures. In short, we want a rigorous explanation of the following hypothesis: say for the sake of argument that 
80% of the cartilage texture in Figure 2(b) consists of "background" light purple pixels while the remaining 20% 
of pixels lie in a "foreground" of darker-reddish purple blobs; we then expect a local histogram computed over a 
portion of cartilage to be a mixture — convex combination — of 0.8 of the background pixels' distribution with 0.2 of 
the foreground pixels' distribution. Other tissues arise from other distinct decompositions. For example, looking at the 
pseudovascular tissue of Figure 2(d), we might guess it to be 0.5 light pink, 0.25 dark purple and 0.25 reddish-pink. 
We rigorously show that such decompositions of local histograms indeed exist for textures arising from a certain class 
of probabilistic image models; our long-term goal is to exploit this fact in a segmentation-and-classification algorithm. 

To see how to formalize these ideas, it helps to consider a toy example: imagine that at any given pixel location, 
a coin is flipped, with "heads" resulting in a pink pixel value, and "tails" resulting in a purple one. One expects that, 
on average, the local histogram at any point will consist of two peaks: one in the pink portion of J/, and one in the 
purple. Such an image can be regarded as the result of occluding a solid purple image /o with a solid pink one /i : at 
each pixel, the flip of a coin determines whether fi lies on top of /o at that point, or vice versa. More generally, the 
occlusion of a set of N images {fn}^^^ in ^(^, J/) with respect to a given label function (f e £(X,Zn) is: 

(occ^{/.}t"o )W •= Uix){x). (2) 

That is, at any pixel location x, the label ip{x) determines which of the potential pixel values {fn{x)}n=Q actually appears 
in the composite image occ^p{fn}^~^ at that point. 

The main results of this paper are concerned with when the local histograms (1) of a composite image (2) are 
related to the local histograms of the individual /„'s. Though it is unrealistic to expect a clean relation for any fixed (f, 
we can show that these quantities are indeed closely related, provided one averages over all possible label functions 
(f. Indeed, denoting the probability of getting "heads" in the above toy example as p g [0, 1], we would expect the 
volumes of the pink and purple peaks of the composite image's local histograms to be p and 1 -p, respectively. That is, 
LHvt;OCC<^{/o, fi] should be (1 -p)LHvt;/o + pLHw/i, on average. We generalize this idea so as to permit more realistic 
textures with more colors and with spatially-correlated pixels. 

To be precise, fix a set of source images {/«}^~o^ and consider the set {occ^p{fn}^~Q}^ei{x,'LN) possible composite 
images (2) obtained by letting (p be any one of the N^^^ elements of i{X, Z^), where |^| denotes the cardinality of X. 
We refer to a random method for choosing one of these composites as an occlusion model O. Formally speaking, O 
is a random variable version of (f, meaning there exists a probability density function Po : £(X,Zn) [0, 1] such 
that Y^tpeiiXM Po(v^) = 1. For example, imagine three 128 x 128 images /o, /i and /2 which exhibit a nearly constant 
shade of pink, purple and red, respectively. Given any label function (f : Z^^^ Z3 we can produce a corresponding 
128 X 128 composite image occ<^{/o, /i, /2} whose pixels are some mixture of pink, purple and red. For some choices 
of (f the resulting composites will look like the pseudovascular tissue texture given in Figure 2(d). However, even in 
this small example, there are an enormous number of such possible composites — one for each of the 3^^^ possibilities 
for (f — and only a few of these will look like pseudovascular tissue; most will appear as pink-purple-red static. The 
role of the occlusion model O is to assign a probability to each of these possible ^'s in a manner that emphasizes those 
textures one expects to appear in a given tissue while de-emphasizing the rest. 

In this paper, we provide a sufficient hypothesis on the occlusion model O so as to ensure that the local his- 
tograms (1) of a composite image (2) can, on average with respect to Po, be decomposed in terms of the local 
histograms of the individual images. In particular, we focus on the special case where the occlusion model O is flat, 
meaning that on average, the probability that O chooses label ^ at a given pixel location x is equal to the probability 
of choosing n at any other x'; formally, O is flat if there exists scalars {An}n=o ^^^^ 

^ Po(^) = ^, VxG^. (3) 

That is, O is flat if the marginal distributions obtained by fixing any given x g ^ are identical. Note that for any fixed 
X e X, summing (3) over all n yields that Yjn^i = 1- Indeed, at any given pixel location x, the value An represents 
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the probability that the random label function O will have label n at that x. In our toy example where the values 
of if are determined by |^| spatially-independent coin flips, the probability of getting any particular (p e Z2) is 
P(d(^) = p'*^ ^^^'(1 -p)''^!"!'^ ^^^1; substituting this expression into (3), the binomial theorem implies that this model 
is indeed flat with Aq = 1 - p and Ai = p. Note that, if p > ^, the resulting random image ocC(d{/o,/i} will be 
more pink than purple; flatness does not mean that each label is equally likely, but rather that the chance of being 
pink at any given pixel location is the same as at any other location. These concepts in hand, we present one of our 
main results, which formally claims that, on average, the local histograms of composite images produced from flat 
occlusion models are but mixtures of the local histograms of the source images: 

Theorem 1. If(D is flat as in (3), then the expected value of the local histogram transform (1) of a composite image (2) 
is a convex combination of the local histograms of each individual image: 

N-l 

2 v^mui^occ^{fn}t^){x,y) = Yj Ui^^M(x,yy (4) 

From the point of view of our motivating application, the significance of Theorem 1 is that it gives credence to a 
certain type of segmentation-and-classification algorithm. To be precise, given a set of training images which are 
manually segmented and labeled by medical experts, we, for any given tissue type, can compute local histograms 
over regions which are labeled as that type. In light of Theorem 1, it is reasonable to demix — decompose into convex 
combinations — the local histograms of that type into a set of more basic distributions. For example, we expect that 
the local histograms of pseudo vascular tissue (Figure 2(d)) can be demixed into three simpler distributions — one pink, 
another purple and a third reddish-pink — while those of connective tissue (Figure 2(c)) are mixtures of only the first 
two. Once sparse demixings of each tissue type have been found, we then use them to segment and classify: given a 
new image, we assign a label at any given point by determining which particular set of learned distributions its local 
histogram is most consistent with. 

The remainder of our main results are in support of this interpretation of Theorem 1 . Specifically, the next section 
contains several basic results on local histograms. In Section 3, we prove Theorem 1 and also a generalization of it — 
Theorem 4 — to the non-flat case. In Section 4, we provide various methods — Theorems 5, 6 and 7 — for constructing 
flat O's, and some of these produce textures that resemble those found in digital microscope images of histological 
tissues. The final section discusses a preliminary segmentation-and-classification algorithm inspired by Theorem 1 in 
which local histograms are decomposed using principal component analysis (PC A). 

Both local histograms and probabilistic image occlusion models have long been subjects of interest. Theorem 2 
below details how local histograms can be computed as systems of convolutions; a similar result is given in [8], 
and both [8] and [18] discuss how such a computation can be implemented in optical hardware. Recently, local 
histograms have been used in an active contour-based segmentation scheme [16]; this algorithm partitions an image 
into two smoothly bounded regions whose pixel values are maximally separated with respect to the Wasserstein (earth 
mover's) distance. Local histograms have also recently been used as smoothing filters [7]. Though the work we 
present here focuses exclusively on local histograms of the pixel values themselves, an alternative approach is to first 
pass the image through a filter bank and then compute histograms of the resulting values [6, 12]. Local histograms, 
like time-frequency transforms, preserve global spatial context while obscuring all local spatial context, and as such 
they are well-suited to the processing of locally orderless images [5, 9, 10]. We use local histograms to analyze a class 
of textures generated by a certain probabilistic occlusion model; this model, like the dead leaves model [3, 11, 15], 
generates these textures via a sequential superposition of random sets. Our contribution to this body of literature is a 
formalism that unifies the theory of local histograms with that of occlusion models and permits us to rigorously prove 
that local histograms are indeed a useful transform for the analysis of a particular class of textures. 

2. Local histograms 

In this section, we discuss an eflScient means of computing local histograms (1) and discuss several of their basic 
properties. Computing local histograms can be time consuming, especially as X and J/ become large. In particular, 
for a general window w, a direct computation of (1) requires (9(|^p|J/|) operations: (9(1^1) operations for each x e X 
and yeif.A more efficient method is given in Theorem 2 below: (1) can be computed as a system of IJ/I convolutions 
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Figure 3: An example of how to compute local histograms using Theorem 2(a). For the sake of readability, larger numerical values are represented 
by darker shades throughout. The source image (far left) / is 6 x 8 and has grayscale values ranging from to 4. That, is / G i{X, J/) where 
= X Zg and J/ = Z5. Its characteristic function (5) is a {0, l}-valued 6x8x5 data cube whose cross-sections (left column) indicate those 
locations at which / attains any given value. By Theorem 2(a), the 6 x 8 x 5 data cube that contains the local histograms of / (far right) may be 
computed one level at a time (right column) by filtering these binary-valued cross-sections with a real-scalar-valued weighting function (middle 



column). In this simple example, the weighting function is w ■ 
corner of the grid. 



+ I ((^-1,0 + ^1,0 + ^0,-1 + ^0,1), where the origin lies in the upper left-hand 



over X, which only requires (9(|/Y||J/| log operations if discrete Fourier transforms are used. In particular, we filter 
the characteristic function of the graph of /, namely 1/ : ^ x J/ ^ R, 

lf(x,y) := = 6y{f{x)) = | ^J^J = J; (5) 

with the reversal of w g £(X,R), namely w(x) := w(-x). This method for computing local histograms is illustrated 
in Figure 3. Alternatively, (1) can be computed as a single convolution over x J/; here, the tensor product of 
w G R) with 0) e ^(J/, R) is defined as w a; g ^(^ x J/, R), (w oj)(x, y) := w(x)ci)(y). 

Theorem 2. For any w e R), oj e ^(J/, R), / g £(X, J/), xeX^andye J/; 

(a) Local histograms (1) can be evaluated as a system of\}^\ convolutions over X: (LHvt;/)(x, = (w >k 1 f-^y^){x). 

(b) Alternatively, (1) may be computed as a single convolution over X x J/; (^o ® co) ^ LHw/ = {w ® oj) ^ \f. 
In particular, taking co = 6q gives LH^t;/ = (w ^o) 1 /• 

Proof. For (a), replacing x' with -x\ and substituting the relation Sy(f(x - x')) = 1 f-^y}(x - x') into (1) yields: 
(LH,/)(x, = >^(^0^,(/U + x')) = w(-/)1/-m,}(x -x')=Yj w(x')lf-^y}(x - x') = (w * lf-^y})(x). (6) 

x'eX x'eX x'eX 
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For (b), the definition of gives: 

[(^0 ^ CO) * IM^m^y) = ^ (^0 ^ co){x\y){Ul^f){x - x\y - /) = ^ co{y){U\^f){x,y - /). (7) 

(x',/)e^xj/ /ey 

Substituting (6) into (7) and using (5), gives our result: 

[(^0 ®CO)^ Ul^f\x,y) = Yj CO(y)(w ^ lf-i{y-y})(x) 
ye}/ 

= ^ Oj(y) ^ W(x')lf-l{y_y}(X - 

= 2 ^ oj)(x\y)lf(x -x\y-y) 

(x'y)eXx}/ 

= [(w0(^) lf](x,y). □ 
The next result summarizes several other basic properties of local histograms, the proofs of which are given in [2, 13]. 
Proposition 3. For any w e £{X, R) and f e £{X, J/); 

(a) The levels of a local histogram transform sum to 1; for any x ^ X, Z);gj/(LHw/)(x, = 1. 

(b) Local histograms commute with spatial translation T^; for any x ^ X, LH^T^ = T^^'^^LH^ . 

(c) Adding constants to images shifts their local histograms along J/; for any j e J/, LH^C/ + = T^^'^^LH^/ . 

(d) Quantizing an image will bin its local histograms: for any q g ^(J/, J/'), 



[LH,(^o/)](x,y)= ^ (LH,/)(x,j). 



This basic understanding of local histograms in hand, we turn to the theory of applying them to textures generated 
by the probabilistic image occlusion models discussed in the introduction. 



3. Local histograms of randomly-generated textures 

In this section, we rigorously confirm our intuition regarding local histograms of textures generated via random 
occlusions: if a texture, such as that found in the pseudovascular tissue of Figure 2(d), is some sufl&ciently- spatially- 
random combination of 50% pink pixels, 25% purple pixels and 25% red pixels, then its local histograms should, on 
average, be a mixture of three simpler distributions, namely a convex combination of 0.5 of a purely pink distribution 
with 0.25 purely purple and red ones. 

To do this, fix any set of source images [fn}^!^ and let O be any occlusion model as defined in the introduction. 
That is, let O be a random variable version of a label function (p : X Za^, as defined by a probability density 
function Po : i{X, Z^) [0, 1] where Z<^g^(a:,Za^) Po(^) = 1. In the results that follow, a useful quantity to consider is 
the expected value — with respect to Po — of the characteristic function obtained by letting / = ^ in (5): 

U(x,n):= Yj Po(^)Vx,^)= Yj 

Essentially, lo(-^, ^) is the probability that a random label function (f generated by the occlusion model O will assign 
label n to pixel location x. When compared with the definition of flatness (3), we see that O is flat if and only if there 
exist scalars {^n}n=o ^^^^ ^^^^ l^(x,n) = An for all x g ^ and n e Z^. That is, O is flat if and only if l(^(x,n) is 
constant with respect to pixel location x. Having this concept, we present one of our main results: 
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Theorem 4. For any sequence of images {fn}^^^ ^ ^(^, J/), weighting function w and any N -image occlusion model 
O, the expected value of the local histogram (1) of the composite image (2) with respect to w is: 

N-l 

Eo(LH,occo{/„};)L"o )(^')^) = Yj ^)(LHw/Jfe J) + ^, (9) 
where the error term s is hounded by \£\ < ^ ^ w(x')|1<d(-^ + x\ n) - \^{x, n)\. Moreover, 



«=0 x'eX 

N-l 



^lofe^)= 1, (10) 



«=0 

and so (9) states that, on average, the local histograms of the composite image occ^{ fn]^'^ can be approximated by 
convex combinations of local histograms of each individual image fn. 

Proof The expected value of the local histogram (1) of a composite image (2) is: 

Eo(LH,occo{/„}t-o')(x, = Yj ^^^^^ Z >^(^0^,((occ^{/.}t-o )U + x')\ dD 

For any fixed if, x, and x', we have ip{x + x') = ^ for exactly one n. For any fixed x, x' and y, we can therefore split a 
sum of l<^(x + x', n)Sy(fn(x + x')) over all ^ into one summand where n = (p(x -\-x') and the remaining N-l summands 
for which n ^ (f(x -\- x'): 

N-l 

2 + x\n)Sy(fn(x + /)) = + ^0) + Yj ^^^^y^fn^^ + = dy{{0CC^{fn}t^){x + /)), (12) 

where the final equality follows immediately from (2). Substituting (12) into (11) and using (8) yields: 
Eci>(LH,,ocCci>{/„};)L"o )(x,};) = ^ Po(^) ^ w(/)( ^ l^{x + x^ ^)^,(/„(x + /))) 

= YY ^(^'^^y(Mx + xO) Y Po(^)l^(^ + ^) 

n=0 x'eX (fei(X,ZN) 
N-l 

= YY ^(^0^,(/«(^ + x'))U(x + n). (13) 



Rewriting (13) in terms of 6: := ^ ^ w(x')Sy(fn(x -\- x'))[l(^(x -\- x\ n) - l(^(x, n)] gives our first claim (9): 

n^O X'eX 

N-l 

Eo(LH,occo{/.};)L"o )(x, J) = YY 

w(x')Sy(fn(x + x'))l^(x, n)-\-s 

n^O X'eX 
N-l 

= Y lo(-^' ^) Y ^^^')^y^fnix + X')) + E 

n^O X'eX 
N-l 

= Y lo(-^' ^)(LH^/„)(x, y) + 
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For the second claim, we bound e using the triangle inequality and the fact that \dy{fn{x + x'))\ < 1: 



s = 



N-l N-l 



n-O x'eX n-0 x'eX 

Finally, to prove our third claim (10), note that for any fixed x e X,(^) gives: 



Since as previously noted we have (p(x) = n for exactly one n, (14) becomes: ^^l^(x,n) = ^ F^((p) = 1. □ 

«=0 (pe£(X,ZN) 

An example illustrating the direct computation of the left-hand side of (9) is given in Figure 4. Note that Theorem 4 
implies that the error term s in (9) will be small provided the probability loC-^, ^) of assigning label ^ to x changes 
little as X varies over regions smaller than the the support of w. The extreme case of this is when the occlusion model 
O is flat, meaning l^(x,n) is constant with respect to x. In this case, s vanishes entirely, leading to Theorem 1 as 
given in the introduction: 

Proof of Theorem 1. If O is flat, l^(x -\- x\n) = l^(x,n) for all x, x' e X. The error bound in Theorem 4 then gives 
s = 0. Denoting lo(-^, ^) as An in (9) thus yields our claim. □ 

That is, when O is flat, (9) simplifies to (4), and so the in-depth computation of Figure 4 can be replaced by the 
much simpler one depicted in Figure 5. Thus, flatness is indeed an important theoretical assumption for the analysis 
of local histograms of textures generated via random occlusions. It nevertheless remains to be shown that flatness is 
also a realistic assumption from the point of view of our motivating application; this is the topic of the next section. 



4. Flat occlusion models 



Theorem 1 gives some insight into the behavior of the local histograms of images generated via random occlusions. 
However, this result only holds when O is flat (3), namely when its average characteristic function l(^(x, n), as defined 
in (8), is constant with respect to pixel location x, but is still permitted to vary with label value n. In this section, we 
demonstrate that flatness is a reasonable assumption. In particular, we provide a variety of methods for constructing 
flat occlusion models. Some of these models produce textures similar to those encountered in digital microscope 
images of histological tissues. Our first method involves the translation operator : X ^ X, T^(f(x') := (f(x' -x). To 
be precise, we show that an occlusion model O is flat if it is translation- invariant, meaning that its probability density 
function Po satisfies: 

Po(T» = Po(^), G £(X, ZnI xeX. (15) 
Theorem 5. //"O is translation-invariant (15), then O is flat (3). 

Proof We begin by placing an equivalence relation ~ on £(X, Z^), letting (f'^(f when there exists some x e X such 
that (f' = T^(f. Letting denote a set of representatives from the corresponding equivalence classes, we have that for 
all (f' e £(X, Zn), there exists a unique (f eR such that (f' = T^(f. As such, 

U= Po(^)1^ = XZ^^^^'^V- (16) 

Now, fix any (f e'R, and consider the subgroup "W^p = {x e X : T^(f = (f} of the finite abelian group X. Letting X/^W^p 
denote a fixed set of coset representatives of X with respect to W<^, we claim that JS : X/^^p {(p' : (f' ~ (p), 
I3(x) '= T^(fi is a bijection. 

Indeed, to show JS is one-to-one, note that if T^(fi = I3(x) = I3(x') = T^' (fi, then T^"^V = (f, implying x - x' e "W^p', 
since x and x' are both coset representatives of X/^^p, this is a contradiction unless x = x'. Meanwhile, to show jS 
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k=0 



Figure 4: An example of how to compute the left-hand side of (9) explicitly as a probability-weighted sum. For the sake of readability, larger 
numerical values are represented by darker shades throughout. We consider two 2x2, 3-bit images, namely {fn}^'^ in ^(X,^) where = 2, 
?( = Z2XZ2 and J/ = Zg. In this particular example, the values of the /„'s are all distinct, with /o assuming values {0, 1, 2, 3} and f\ assuming 
values {4, 5, 6, 7} (far left). There are A/^''^' =2^ =16 distinct label functions : Z2 x Z2 ^ Z2 (left column) each yielding a composite image 
occ^{/o, /i } (center column); in accordance with (2), we take values from /o in places where (p is white and values from f\ where (p is black. Each of 
these composites has a 2x2 x 8 local histogram transform (1) (right column). Since occlusion (2) is nonlinear, there is no clean relationship between 
the local histograms of any single composite and the local histograms of the source images /o and /i . Nevertheless, under certain hypotheses, we can 
say something about the average of these local histograms (far right) with respect to some probability density function Po on the set ^(Z2 x Z2, Z2) 
of all possible (p's. In particular, if the occlusion model O is flat (3), meaning in this case that the probability-weighted-sum of all (p's is a constant 
function of x, then Theorem 1 states that this average is a convex combination of the local histograms of fo and f\ as depicted in Figure 5. 



is onto, take any (f' ~ (f, and consider a corresponding such that (f' = T^'(f. Taking the unique x e Xl^^p and 
w G such that jc' = x + w, we have: if' = T^V = T^+^^ = T^(T^^) = T> = fi(x). 
Invoking this claim, along with the assumed translation-invariance of O, yields: 

<P'-(P xeX/^V^ xeX/^V^ xeXj^V^ xeXj^V^ 

Again, writing any g ^ as = x + w, where x e Xl^^p and w g W<^, gives: 



Z it'v = Z Z = ( Z 1) Z = i^^i Z It 



x'eX xeX|^V^we^V^ 

Substituting (18) into (17) gives: 



xeX|^V^ 



j]p.(,')v=P.(,) X iT-v = ^Zw 



(18) 



(19) 



xeXI^V^ 
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^ AnLHwfn 



Figure 5: A continuation of the example of Figure 4. When the occlusion model O is flat, Theorem 4 becomes Theorem 1, with (9) simplifying 
to (4). Though each of the 16 distinct composite images shown in Figure 4 has a distinct local histogram transform, the average of these 16 local 
histogram transforms with respect to Po is but a convex combination (right) of the local histograms (center) of the two source images (left). That 
is, when O is flat, the average-over-all-composites local histogram computed in Figure 4 is equal to the average-over-all-sources local histogram 
computed above. 



Since «) = Z I 2^ - x') ^ II | = l'"^' ^ ' ^^"^'^ = "'I = 1^"''""' substituting (19) into (16) gives: 

UM = X Z P.(.')l.(x,n) = 1:^1: lx.v(x,«) = Z 

implying O is flat, since the value of \^{x, n) depends only on n and is independent of x. □ 

Theorem 5 indicates that flatness is not too strong of an assumption. Indeed, one method for producing a flat 
model O is to generalize the coin-flipping example given in the introduction: given any random method for picking 
a number from Z^^ — a probability spinner — produce (f by conducting |^| independent spins. The resulting model O 
is translation-invariant, and therefore flat, since PoC^) is solely determined by the number of times that (p achieves 
each given value n. Other translation-invariant examples abound. For instance, for any fixed we can assign equal 
probability ^ to ip^ and each of its translates, and assign probability to all others; if the source images {fn}^^^ are 
constant, the composite images (2) produced by such a model are all translates of a single image. More generally, we 
can always partition the N^^^ elements of i{X, Z^) into translation-invariant equivalence classes and assign any fixed 
probability to the members of each class, provided we ensure that in the end they all sum to one. For example, for the 
case N = 2 and ^ = Z2 x Z2 depicted in Figure 4, we may partition the 16 possible ^'s into 7 such classes, and pick 
any probabilities {pk}llQ such that pi = p2 = p3 = p4, ps = p6, pv = Ps, P9 = Pio, Pii = P12 = P13 = P14. Armed with 
one method — translation-invariance — for producing flat models O, we now turn to ways of combining known models 
to produce more complicated and realistic ones. 

4.1. Expansion 

Digital microscope images of histological tissues often contain randomly distributed blobs. These blobs corre- 
spond to biological structures: cells, nuclei, etc. The nature of these processes guarantees that the distribution of such 
structures is roughly uniform, both spatially and in terms of color: two cells cannot occupy the same space; cells will 
usually grow and reproduce so as to occupy any empty space; cells in a given tissue all have approximately the same 
size and color patterns. We want to construct flat occlusion models that emulate such textures, since in light of Theo- 
rem 1, doing so would formally justify the demixing of local histograms as part of a segmentation-and-classification 
algorithm. Note that there is a natural method for randomly generating a set of roughly uniformly-distributed points: 
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(a) (f 



(b) Some examples of if/x. 




• f 



(C) * Wx]xeX 




(d) 



(e) Some examples of if/'^. 



(f) * {il^'x]xeX 



Figure 6: Examples of the expansion operation (20), where black denotes the value of 1, and the lighter shade denotes the value of 0. A function 
: A' ^ {0, 1} is given in (a), and can be chosen, for example, via a sequence of independent coin flips. Meanwhile, for each x G .Y, we 
pick a corresponding function ijjx : X ^ {0, 1}. Cropped versions of a few examples of such ij/x'^ are given in (b).The expansion if * {il^x]xex of 
^ by [il/x]xeX is given in (c). Essentially, each point x for which (p{x) = 1 is replaced with the corresponding blob ipx, with the origin of the ipx 
coordinates being translated to x. In the second row, (f) shows the expansion of a second set of points (p' by a second set of blobs {\l/'x]xeX- These 
examples notwithstanding, note that (20) does not require these blobs to be disjoint. We could have, for instance, produced a texture by expanding 
the points in (d) by the blobs in (b). Nevertheless, stronger conclusions can be made if such disjointness is enforced; see Theorem 6. 



flip a coin at each point x. Here, we explore the idea of expanding each of these randomly generated points into a 
given blob. 

To be precise, let (p e Z2) indicate a set of randomly generated points. For each of the points x g ^ for which 
(f(x) = 1, we will replace it with a blob whose shape is indicated by some i//x ^ ^(^, ^2)- The new texture will be the 
union of all such blobs. Formally, given any (f e Z2) and {<Ax}xga: ^ [^(^, ^2)]'^, we define the expansion of (f by 
{i/^x}xeP( to be ^ * {i/^x}xeP( e Z2), 

r, 1 ^/ X f 1' X = X' -^X'\(p(x')=l,l//^^(X'')=1, 

Two examples of this expansion operation are given in Figure 6. Note that expansion itself (20) is not an occlusion 
model. Indeed, (20) is but a way of combining functions in ^(^, Z2) to produce other ones, whereas an occlusion 
model is a random variable O defined by a probability density function Po over Z2). This fact notwithstanding, 
the expansion operation (20) on label functions (f and {i/^x}xe?( does in fact induce a parallel operation on their random 
variable cousins O and ^. To be precise, given two occlusion models O and ^ from X into Z2, we define the expansion 
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of O by ^ to be the occlusion model O * ^ whose probability density function is Po*^ : ^(^, ^2) [0, 1], 

Po*^(cr) := Yj P^^^) n ^^^^ 

Note that the probability that O * ^ will produce a given label function cr depends on the ways in which cr can 
be written as * {^x}xe?( and, moreover, the probability that O and ^ will produce those particular ^'s and i/^x's, 
respectively. In the next result, we verify that (21) indeed defines a probability density function on ^(^, Z2). We 
further show that if O is translation-invariant (15), then O * ^ is translation-invariant which implies that O * ^ is flat 
by Theorem 5. In particular, image models which produce collections of blobs similar to those found in biological 
tissues will indeed be flat provided the distribution that produces the "centers" of these blobs is translation-invariant. 
Moreover, if the flatness of O * ^ is all that is desired, we can weaken the requirement that O be translation-invariant 
so as to only require that O is itself flat, provided O and ^ are effectively disjoint: 

If Pcd(^) > and P^(<Ax) > for all xeX, then (f * {i//x}xe?( = ^ T^<Ax. (22) 

xeX 
(f(x)=l 

Put another way, (22) means that there is only at most one way, with nontrivial probability, in which the x in (20) can 
be written sls x = x' -\- x" where both <^{x!) - 1 and xlf^'ix") - 1. 

Theorem 6.If(D and ^ are occlusion models from X into Z2, then their expansion O * ^, with probability density 
function (21), is as well. Moreover, if(^ is translation-invariant (15), then O*^ is translation-invariant. Furthermore, 
if(S) and ^ are effectively disjoint (22) and either O 6>r ^ is flat (3), then O * ^ is flat. 

Proof. We first show that (21) defines a probability density function, namely that values of Po*^(cr) over all cr in 
£(X, Z2) sum to 1. Since Po is a probability density function by assumption, we have: 

1= Yj P^^^)- 

Similarly, for any fixed x e X,we have: 

1= Yj P^^^-)' ^^4) 

where the subscript "x" on i// indicates that this particular i// is intended to expand (p at the particular point x as opposed 
to at some other point. Taking the product of (23) with the product of (24) over all x yields: 

1=1(1)1^^1= p^(^)n z p^^^-)= z Po(^)np^(^-)' (25) 

(pe^iX, Z2) xeX il/^e£{X,Z2) ipe£{X,Z2) xeX 

where the final quantity in (25) contains all of the cross terms resulting from distributing the product over all sums of 
the form (24). Now, since for each choice of (p and {i/^x}xeX there is exactly one resulting cr = (f ic {i/^x}xeX^ we can 
rewrite (25) in terms of the definition (21) of Po^y, obtaining our claim: 

o-e{(X,Z2) (pe£(X,Z2) xeX crei{X, Z2) 

{<Ax}xe;^e[W,^2)]'^ 

Thus, (21) indeed defines a probability density function, as claimed. 

We next show that the occlusion model O * ^ is translation-invariant, if O is translation-invariant. To do this, we 
claim that if T^cr = ^ * {^x}xex then cr = (T"^^) * {^x+x}xeX' To see this claim, note that 

(t{x - x) = (TV)(x) = (^ * Wx'}x'ex){x) = 1 
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if and only if there exists some x\ x" in X such that x - x' ^- x" , ^{x!) - 1, and \lf^>{x") - 1. Letting x - x - x, 
we thus have that (t{x) = 1 if and only \i x - {x! - x) + x" , where (T"^^)(x' - x) - ip{x! - x + x) = ^{x') - 1 and 
\lfi^x'-x)^x{^") - ^x'{^") - 1, implying cr - (T"^^) * {^x^x\x^x-> as claimed. Having the claim, (21) implies: 

^ Po(^) w p^(<Ax) = Yj ^^^^^ n p^^^^)- 

(^G^(;\^,Z2) x^X ipe£{X,'L2) xeX 

To continue, we make the change of variables (f' := T~^(f and j/^^ := i//x+x' 
Since O is translation-invariant and ]~~| Pxj/(</^^_jf) = ]~~| Pxj/(</^^), we have: 

xgA: xeX 

Po*^(TV) = P^^^') n ^^^^^^ " Po*^(cr), 

(^'G^(.Y,Z2) xeX 

((p')-k{l//'JxeX=(^ 

and so O * ^ is indeed translation-invariant (15), as claimed. 

For our final claim, we assume that O and ^ are eff'ectively disjoint (22) and that either O or ^ is flat. To do so, 
it is helpful to characterize the flatness of an arbitrary occlusion model O from X to Z2 in terms of the corresponding 
function O := Yj(fei(x,Z2) Po(^)v^. Indeed, for any (f : X ^ Z2, (5) may be rewritten as l<^(x, 1) = (f(x) and so: 

To(x,l)= Yj Po(^)1^(x, 1) = Yj Po(^)^(x) = 0(x). (26) 

<pe£(X,Z2) (pe{(X,Z2) 

In light of (26), we claim that O is flat if and only if O is constant. Indeed, if O is flat, then there exists Ai such that 
0(x) = 1(d(x, 1) = /li for all x e X. Conversely, if 0(x) is constant, then there exists Ai such that lo(-^, 1) = 0(x) = Ai 
for all X G ^; by (10), this further implies that lo(-^, 0) = 1 - lo(-^, I) = I - Ai for all x g ^ and so O is flat. 

Having this claim, we show that O * ^ is flat by showing that O * ^ is constant. To do this, we show that if O 
and ^ are efl'ectively disjoint then O * ^ = O ^ ^ where denotes standard convolution over X. According to the 
definition of O * ^ (21) we have: 

^ F^M(r)(r= Z Po(^)(nP^(^-))(^*{^-}-^^)- (^7) 

o-G^(.Y,Z2) o-G^(^,Z2) (fef(X,Z2) ^xeX ' 

{l/^x}xeXm?(,Z2)f 
^M^x]xeX=Cr 



Since any particular choice of (p and {il/x}xex produces a unique cr via * we can simplify (27) to 

= ^ Po(^)(n P^(^Ax))(^ * {^x}xexy 



(28) 



q^ei{X,Z2) ^xeX 

{l/^x}xeXm^,Z2)f 



Moreover, since O and ^ are eff'ectively disjoint (22) we have (p * {il/x}xex = ^ T^Vx' meaning (28) becomes: 

x'eX 
<f(x')^l 

Yj Po(^)(np^^^-))( z ^''^-') 



ipet{X,Z2) ^xeX ^ ^ x'eX 

{«A.}.e^e[^(^,Z2)]'^ 

<^G^(.Y,Z2) x'eX ^{l/^x}xeX^U(^,Z2)]^^XEX ' 

(f(x')^l 
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(29) 



Now, for any fixed x' ^ X such that ip{x') = 1, we factor the corresponding innermost sum in (29) into a product of 
1^1 distinct sums — one for each x e X — to obtain: 



Substituting (30) into (29) then gives: 



0*^= ^ Po(^) ^ T-'^= ^ Po(^)( X = ( Yj Po(^)^)*^ = 0*^. 

if(x')^l ^(x')^l 



Thus, the eff'ective disjointness of O and ^ indeed implies O * ^ = O ^ ^. As such, if we further assume that either 
O or ^ is flat, then either O or ^ is constant, implying in either case that O * ^ is constant and so O * ^ is flat. □ 

4.2. Overlay 

Above, we discussed how the expansion (21) of a binary-valued occlusion model O with another such model 
^ is a new model O * ^ that randomly generates label functions of the form cr = ^ * {i//x}xex as defined in (20). 
Under certain hypotheses. Theorem 6 gives that such models O * ^ are flat, meaning their local histograms can be 
understood in terms of Theorem 1 . Moreover, some examples of these models produce textures that resemble those 
encountered in histological tissues: if /o and fi are roughly constant light purple and dark purple fields, respectively, 
then the composite image occ<^{/o, /i} obtained by picking (f as in Figure 6(f) bears some similarity to an actual image 
of cartilage, such as the one given in Figure 2(b). Taken together, these facts provide some theoretical justification for 
the use of local histograms for the analysis of such tissues. 

There is however a deficit with this theory: due to the nature of the construction (20), models produced by 
expansion (21) can only be binary- valued, and as such are insufl&cient to emulate textures that exhibit three or more 
distinct color modes, such as the pseudovascular tissue depicted in Figure 2(d). In this subsection, we discuss a method 
for laying one occlusion model over another which, amongst other things, permits us to build multivalued models out 
of binary- valued ones. To be precise, for any (f e f(X, '^n^), <A ^ ^(^' '^n^) <^ ^ ^(^^ ^2), we define the overlay of 
(f over i// with respect to cr to be ^#o-<A ^ ^(^^ ^n^+n^)^ 

(^#.^)(x):=| ^^^^^^^^ cr(x) = l. 

Essentially, an overlay (31) is the result of cutting holes out of an image of (p and laying it on top of an image of 
the location of these holes is indicated by cr and the values of ij/ are increased by a factor of N^p so that they cannot be 
confused with those of ip. Examples of this overlay operation are given in Figure 7. 

In a manner similar to the relationship between (20) and (21), we have that (31) naturally induces a parallel 
operation on occlusion models: given probability density functions Po, and P^ on i{X, Z^^), £(X,Zn^) and £(X, Z2), 
respectively, we define the overlay of the occlusion model O over ^ with respect to E to be the new occlusion model 
0#s^ whose probability density function is Fm^^ - ^(^, ^n^+n^) [0, 1], 

FmM^) Yj Po(^)P^(<A)Ps(cr). (32) 

<fei(X,ZN^) 
ilrE{(X,ZN^) 
o-e£(X,Z2) 

In the next result, we verify that (32) indeed defines a probability density function, and moreover that the correspond- 
ing model 0#s^ is flat provided O, ^ and S are flat, meaning that the local histograms (1) of composite images (2) 
produced by such a model will behave according to Theorem 1 . 

Theorem 7. If(!>,^ and S are occlusion models on i{X, Z^^), i{X, Z^^) and i{X, Z2), respectively, then (32) defines 
a probability density function on 1{X, Za^^+a^^). Moreover, if^, ^, and S are flat, then 0#s^ is flat. 
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(a) cr 





(b) m^o-' 




(c) (t' 



(d) (r#^.0 



Figure 7: Two examples of the overlay operation (31). Recall the two {0, l}-valued label functions cr = ★ {^x]xeX and cr' - (p' W^x^X of 
Figure 6(c) and (f) reshown here in (a) and (c), respectively. Further consider a constant function : A' ^ Zi that assigns label to every point in 
X. The overlay (31) of over cr' is given in (b); essentially, cr-shaped holes are cut from and the result is laid over cr' , resulting in a new texture. 
A distinct texture can be produced by cutting cr' -shaped holes out from <j and laying the result over the constant function (d). Overlaying the 
resulting textures with each other can produce even more complicated textures. 



Proof. To show that (32) defines a probabiHty density function on t{X, Za^^+a^^), note that: 



1 = (1)(1)(1)= Yj P^^^) Z ^^^^^ Z P^^^) = Z Po(^)P^WPi:(cr). (33) 

cr^tiX^Zi) 

Noting that for each fixed ^, and cr, there exists exactly one v e £(X, Z^^+n^) such that (f#o-i// = v, (33) becomes: 
1= Z Z Po(^)P^(^)Pi(cr) = Yj 

vef(X,ZN^^N^ ) (pef(X,ZN^ ) ve£{X,ZN^^N^ ) 

ilref(X,ZN^) 
o-e£(X,Z2) 

as claimed. For the second conclusion, assume that O, ^, and S are flat. Our goal is to show that 0#s^ is flat (3), 
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meaning that for any n e Z^^+n^, we want to show that there exists a scalar An such that: 



^ Po#,^(^^) = ^ (34) 



v{x)-n 

for all X ^ X.To see this, note that for any such x and n, we have: 

^ Po#,^(i^)= Yu Z Po(^)P^WPs((^) = 2 Po(^)P^WPs(^^). (35) 

cret{X,Z2) cret{X,Z2) 

Now, in the special case where ^ = 0, . . . , A^<^ - 1, (31) gives that ((f#o-i/^)(x) = nif and only if (fi(x) = n and cr(x) = 0. 
As such, in this case (35) becomes: 

Yj Po#z^(^) = X Po(^)P^(<A)Ps(c^) 

ve£iX,ZN^^N^) <p^f(X,ZN^)Mx>n 
v{x)=n ilfe£{X,ZN^) 

cre£{X,Z2),cr{x)^^ 

^ ^e£{X,ZN^ ) ' ^ ilfe£{X,ZN^ ) ' ^ crefiX, Z2) ' 

(fi(x)=n cr(x)=0 

= ^o,«^s,o- (36) 

If, on the other hand n = A/<^, . . . , A/<^ + A^^ - 1 then (31) gives that ((f#o-i//)(x) = n if and only if i//(x) = n - N^p and 
cr(x) = 1. In this case, (35) becomes: 



Yj Po#z^(^) = Y Po(^)P^(<A)Ps(^) 



ifiE£(X,ZN^) 

v(x)=n i//e{(X,ZN^),i//(x)=n-N^ 
cree{X,Z2\cr{x)^l 



Y Po(^))( Y ^-"^Ai ^ ^^^A 

ipet{X,ZN^ ) ' ^ iffe£{X,Zj,^ ) ' ^ (Te£{X,Z2) ' 

ij/{x)^n-N^ cr(x)=l 

= ^^,n-N^^i:,i' (37) 
Thus, for any x g ^ we either have (36) or (37) meaning 0#s^ is flat (34), as claimed. □ 



5. A local histogram-based segmentation-and-classification algorithm 

In this section, we present a proof-of-concept segmentation-and-classification scheme that is inspired by Theo- 
rem 1. We emphasize that for the algorithm presented here, local histograms are the only image features that are 
computed. That is, the decision of which label to assign to a given pixel is based purely on the distribution of color 
in its surrounding neighborhood. We do this to demonstrate the validity of the concept embodied by Theorem 1 as an 
image processing tool. For algorithms intended for real-world use, such color information should be combined with 
morphological data — size, local and global shape, orientation and organization — in order to obtain better classifica- 
tion accuracies. An example of such an algorithm, accompanied by thorough testing and comparisons against other 
state-of-the-art methods, is given in the sister article [2] to this one; these facts are not reprinted here. 

The concept of Theorem 1 is that the local histograms of certain textures should, on the whole, be able to be 
decomposed in terms of more basic distributions. Indeed, it is reasonable to expect a local histogram computed over a 
region of cartilage (Figure 2(b)) to be a mixture of 0.8 of a "light purple" distribution — a distribution mostly supported 
in portions of J/ that correspond to light purple — with 0.2 of a darker reddish-purple one. Meanwhile, local histograms 
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Figure 8: An example of using PCA of local histograms to perform segmentation and classification of the image given in (a), which is a 3-bit 
quantized version of Figure 1(a). A manually segmented and labeled version of (a) is shown in (b) where black represents cartilage, light gray 
represents connective tissue, dark gray represents pseudovascular tissue, and white represents other tissues that have been ignored in this proof-of- 
concept experimentation. Using (a) as both the training and testing image in a PCA-based classification scheme (41), we obtain the labels shown 
in (c). A similar, but less-accurate classification of (a) can still be obtained if we instead train on (d), resulting in the labels given in (e). 

of Other tissues will correspond to distinct mixtures of other distributions. For example, local histograms computed 
over a region of pseudovascular tissue (Figure 2(d)) might be a mixture of 0.5 of a light pink distribution, with 0.25 
of a dark purple one and 0.25 of a reddish-pink one. 

The algorithm we present here exploits this concept. The first step is to train our classifier. To do so, let K be the 
number of distinct tissue types found in a training image such as Figure 8(a) or (d). For any tissue type k = 1, . . . , ^, 
we compute local histograms {ht,m}^ti about pixel locations {xk;m}^=i that have been labeled as being of that type by 
medical experts. Each ht,m is a nonnegatively- valued function over J/ that sums to one. There are several ways to 
pick the Xk-m'^- One approach is to have the expert choose each point individually. Alternatively, if the expert has 
manually segmented and labeled the entire image (Figure 8(b)), then the Xk;m'^ can be chosen at random from regions 
of type k. The number of local histograms that we compute for type k is somewhat arbitrary; we used repeated 
experimentation to find a sample size large enough to guarantee reliably-decent performance. 

In light of Theorem 1, it would be nice to demix the training local histograms {hk;m}^ti terms of a type- 
dependent class of more basic distributions {gk;n}^Li- That is, we would like to find nonnegatively- valued functions 
8k;n over J/ that sum to one and have the property that for each training local histogram hk;m there exists nonnegative 
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scalars {At,m,n}^=i that themselves sum to one and such that: 

ht,m ~ ^ ^t,m,ngk;n, Vm = 1 , . . . , M;^. (38) 

Unfortunately, computing the gt,n^ that minimize the approximation error in (38) is a nontrivial optimization problem. 
As such, we leave this approach for future work, and instead consider a mathematically-simpler problem in which the 
^k;m,n^ and gk;n^ are permitted to be arbitrary real scalars and vectors, respectively. That is, we perform PCA for each 
tissue type k. To be precise, for each type, we form a |J/| x matrix whose columns are the (vectorized) local 
histograms ht,m less their average hk'. 



Hk(:, m) = hk-m - hk where h = — ^^^ h-m- (39) 

Mk ^ 



m-l 



We then compute the singular value decompositions Hk = U^l^^Vj and identify those left- singular vectors {uk-n]^^^ 
that correspond to some experimentally-determined number Nk of dominant singular values [o-k-n}^li- In this setting, 
the approximation (38) is replaced by: 

_ Nk 

hk-m ^hk + Y^{hk-m - hk, Uk-n)uk-n, Vm = 1, . . . , My^. (40) 

The classical theory of PCA states that the approximation error in (40) is optimally small in the sense that these specific 
Uk-n^ span the particular A^y^-dimensional subspace of ^(J/, R) whose orthogonal projection operator Pk minimizes the 
total squared-error \\hk\m - hk - Pk(hk;m ~ hk)\f'. The vectors hk and {uk;n}^^i in hand, we store them in memory, 
completing the training phase of our classification algorithm. 

To segment and label a given image /, we compute its local histograms (1), obtaining local distributions of color 
/^x • J/ ^ M, hx(y) = (LHw/)(x, y) about every pixel location x e X. At any given x, we then assign a tissue label k(x) 
by finding the tissue type k whose shifted subspace hk + span{wy^;„}^^^ is nearest to h^. Specifically, we let: 



k(x) = argmin \\hx - hk - VX/^x - hk, Uk-,n)uk-n\\ 

_ Nk _ 

= argmin (ll/Z;, - hkf - V Kh^ - hk, utjf) 

/ _ 

= argmin ^ Ih^iy) - hk(y)f - ^ ^{h^iy) - hk(y))uk;n(y) 



k^l,...,K 



' ' (41) 



ye J/ n=l ye}/ 



In implementation, we compute the summations over J/ in (41) as running sums, looping over all ye}/. This 
computational trick greatly reduces our memory requirements: at any given time, we only store a single level of LH^v/. 
By Theorem 2, such a level can be obtained by filtering an indicator function; in the following experimental results, we 
avoided edge artifacts by using a weighted noncyclic method of filtering, namely the *-convolution of [17]. Without 
such a trick, one must store the entire local histogram transform in memory, a daunting task for even modestly- 
sized images: the full local histogram transform of the 1200 x 1200, 8-bit RGB image given in Figure 1(a) is a 
1200 X 1200 X 256 x 256 x 256 array. 

Further computational advantages may be gained by quantizing the image and reducing the dimension of the color 
space. For our particular set of histology images, we experimentally found that we could still obtain good accuracies 
even if we discard the green channel of our purple-pink images, and moreover quantize the 8 -bit red and blue channels 
down to 3-bits apiece. That is, we quantize J/ from Z^^^ to Zg. By Proposition 3, this is equivalent to binning the 
original 1200 x 1200 x 256 x 256 x 256 local histogram array down to a new one of size 1200 x 1200 x 8 x 8. The 
quantized version of Figure 1(a) is given in Figure 8(a); for the sake of readability, a 3-bit quanitized version of the 
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unused green channel was included in this rendering. As a result of this quantization, it only takes a few seconds to 
assign per-pixel labels to a 1200 x 1200 histology image using a MATLAB -based implementation of (41), running 
on standard desktop hardware. For this particular set of images, further color quantization, such as using 2-bit colors 
(J/ = or converting the original image to grayscale (J/ = Z256), results in an unacceptable loss in classification 
accuracy, as do attempts at spatial quantization {X = Z^oo)- 

Two runs of this classification algorithm are depicted in Figure 8. In the first run, we train the classifier on the 
3-bit 1200 X 1200 red-blue image given in Figure 8(a). For the sake of simplicity, we restrict ourselves to ^ = 3 
tissue types: cartilage, connective tissue and pseudovascular tissue; all other tissue types are ignored in the confusion 
matrices given below. For each type k = 1, 2, 3, we randomly choose = 64 points of that type, making use of a 
small number of the 1200^ ground truth labels given in Figure 8(b); edge artifacts are avoided by not picking points 
near the border. For each type, we then perform PCA on the 64 local histograms hk;m of that type, computing an 
average local histogram hk as well as the dominant left-singular vectors of Hk (39). For the sake of simplicity, in a 
given experiment we will use the same number of principal components for each of the three types, that is, A^^^ = N for 
^ = 1, 2, 3. At the same time, we experiment with this number itself, letting N be either 1, 2, 3 or 4. With the training 
complete, we then segment and classify Figure 8(a) using the decision rule (41), resulting in per-pixel labels such as 
the ones given in Figure 8(c) for A/^ = 4. Comparing Figure 8(c) and the ground truth of Figure 8(b), we see both the 
power and limitations of local histograms: color is a big factor in determining tissue type, but by ignoring shape, we 
suff'er from oversmoothing. The accuracy percentages for various choices of N are given by a confusion matrix: 

A^= 1 N = 2 N = 3 N = 4 

Ca Co Ps Ca Co Ps Ca Co Ps Ca Co Ps 

~Ca 77 22 1 87 TI 2 96 3 1 96 3 F 

Co 95 5 91 9 3 94 3 3 92 5 

Ps 2 11 87 2 8 90 6 7 87 5 5 90 

Here each row of the matrix tells us the percentage a certain tissue was labeled as cartilage (Ca), connective tissue 
(Co), and pseudovascular tissue (Ps). In particular, the first three entries of the first row of this table tell us that when 
using a single principal component, those points labeled as cartilage by a medical expert in Figure 8(b) are correctly 
labeled as such by our algorithm 77% of the time, while 22% of it is mislabeled as connective tissue and 1% of it is 
mislabeled as pseudovascular tissue. Note here that we have trained and tested on the same image; such experiments 
indicate the feasibility of our approach in a semi-automated classification scheme in which a medical expert handpicks 
64 points of each given type and lets the algorithm automatically assign labels to the rest. 

The second run of this algorithm is almost identical to the first, with the exception that we use a distinct image 
in the training phase. To be precise, for each of the three tissue types, we perform PCA on the local histograms of 
64 randomly-chosen points of that type in Figure 8(d), making use of its ground truth labels (not pictured). We then 
apply the principal components obtained from Figure 8(d) to generate labels (Figure 8(e)) for Figure 8(a) using the 
decision rule (41). Compared to the first run, the algorithm's performance here is a better indication of its feasibility 
as a fully automated classification scheme, and is summarized by the following confusion matrix: 

A^= 1 N = 2 A^^ 3 A^ = 4 

Ca Co Ps Ca Co Ps Ca Co Ps Ca Co Ps 

~Ca 90 9 1 9l 4 5 90 5 5 83 TI 6~ 

Co 25 61 14 10 62 28 7 79 14 8 70 22 

Ps 30 12 58 4 10 86 4 50 46 2 17 81 

Though the performance in the second run is understandably worse than that of the first, it nevertheless demonstrates 
the real-world potential of the idea exemplified by Theorem 1 : the local histograms of certain types of textures can be 
decomposed into more basic distributions, and this decomposition can serve as an image processing tool. 
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